Fix 32-bit decoding with large dictionary #3472

terrelln · 2023-02-02T00:38:46Z

The 32-bit decoder could corrupt the regenerated data by using regular offset mode when there were actually long offsets. This is because we were only considering the window size in the calculation, not the dictionary size. So a large dictionary could allow longer offsets.

Fix this in two ways:

Instead of looking at the window size, look at the total referencable bytes in the history buffer. Use this in the comparison instead of the window size. Additionally, we were comparing against the wrong value, it was too low. Fix that by computing exactly the maximum offset for regular sequence decoding.
If it is possible that we have long offsets due to (1), then check the offset code decoding table, and if the decoding table's maximum number of additional bits is no more than STREAM_ACCUMULATOR_MIN, then we can't have long offsets.

This gates us to be using the long offsets decoder only when we are very likely to actually have long offsets.

Note that this bug only affects the decoding of the data, and the original compressed data, if re-read with a patched decoder, will correctly regenerate the orginal data. Except that the encoder also had the same issue previously.

This fixes both the open OSS-Fuzz issues.

Credit to OSS-Fuzz

The 32-bit decoder could corrupt the regenerated data by using regular offset mode when there were actually long offsets. This is because we were only considering the window size in the calculation, not the dictionary size. So a large dictionary could allow longer offsets. Fix this in two ways: 1. Instead of looking at the window size, look at the total referencable bytes in the history buffer. Use this in the comparison instead of the window size. Additionally, we were comparing against the wrong value, it was too low. Fix that by computing exactly the maximum offset for regular sequence decoding. 2. If it is possible that we have long offsets due to (1), then check the offset code decoding table, and if the decoding table's maximum number of additional bits is no more than STREAM_ACCUMULATOR_MIN, then we can't have long offsets. This gates us to be using the long offsets decoder only when we are very likely to actually have long offsets. Note that this bug only affects the decoding of the data, and the original compressed data, if re-read with a patched decoder, will correctly regenerate the orginal data. Except that the encoder also had the same issue previously. This fixes both the open OSS-Fuzz issues. Credit to OSS-Fuzz

Cyan4973 · 2023-02-02T01:54:13Z

arf, I just meant to approve the PR, but got it merged instead...
Anyway, I believe it's good to go.

terrelln · 2023-02-02T17:38:25Z

arf, I just meant to approve the PR, but got it merged instead...

No worries! Its a good thing, because that means today's fuzzer builds picked up the fix.

facebook-github-bot added the CLA Signed label Feb 2, 2023

terrelln force-pushed the 2023-02-01-fix-32-bit-decoding branch 3 times, most recently from 4e7a209 to cc3e3ac Compare February 2, 2023 01:13

Cyan4973 merged commit c22c995 into facebook:dev Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 32-bit decoding with large dictionary #3472

Fix 32-bit decoding with large dictionary #3472

terrelln commented Feb 2, 2023

Cyan4973 commented Feb 2, 2023 •

edited

Loading

terrelln commented Feb 2, 2023

Fix 32-bit decoding with large dictionary #3472

Fix 32-bit decoding with large dictionary #3472

Conversation

terrelln commented Feb 2, 2023

Cyan4973 commented Feb 2, 2023 • edited Loading

terrelln commented Feb 2, 2023

Cyan4973 commented Feb 2, 2023 •

edited

Loading